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A COMPUTÄTIÖNAI. LIWGtllSTICS ÄPPRPÄCH TO CCmCEPTUM, IHFORMäTIOK PROCESSING 



Research assdclate. Depl:. of Educatlonal & Bsyehologlöal Research, 
UnlvéÉsity of Euaqiel-^*Ialia!&, SweGlen 

intarodttc t l on 

The purpose of thls papexr Is to present a lingulstic model whlch csm ]be 
used for the proqessing of sqientifiö inf orma tion. The model faas déyelopeii 
through several jeara of research and ^periments In Infoziaation medlationr 
strueturing, oigaliisation and dissendjnatlon in such social and humanistlc 
scisnc^s th$t :^ve U&m. of iutteresit to SHrgdia^ edu^atioiial reseäri^ 
departments. 

It häs beéö apparent that a sp-called "ihförniatiori o^erfloiw'' and "informä- 
tion explosion" are irrelevant concepts to designate the state of aff airs. 
There éxlsts aii éver gJKSWing datä flow hiit a äcinsideragie laeik of infoiäma- 
tion, for éxämple söaöng many professälöiiail cåtögöries rétpirilicr infGanatiOB, 
such as university teachers and resear<^ers. This so-called "information 
frmstration" is likely to depend on the meaning an indivldual abstracts or 
inf ers f rom äié dftta, 0>g. / in the interpretation of concepts and Göacep- 
tual relations mediated in åösuflient titles. Jn conipiaterized ioD search 
strategies theré is ä möre é^åriöiää bä£rie3r ia'^lved. If the logic of the 
computet* or the éystem is unlmotm, there is a high risk of tincertain^ tp 
the real existenc^ of the documeiits, whidh makes for an abstractness of an 
even higher order. Horeover, when the information medla tor conceptaälizes in 
a way dlfferént from the seiaröhejr, Whieh is of ten the case in social seienc^i 
research, it will prove difficult to arrive from data to infoooation. 

Thére are many science flélds dealing with infÖÄiaation, e.g. , library 
Science, information science, art;ificlal Intelligence , oompatational lin- 
guistics and the emerging field of cognltlve science. Ml of tham are con- 
cerned witii the study öf meöhänisras that ma^e pössible Sepréseiittation öf 
information derlved frm symbols. Since computational lingaistios focuses 
on stQdy of language use äMed by oQttpitérä , there exists a link to 



librairy' aaä Information science. For two or three decades great ef forts 
baye been maie ±n the fiiSld of liiifuistlcs, espeoialiy its coopitational 
variety, to meet the needs in informatioft scie«ee öf strttcturing and 
aÉri'Viiig ineani4igful résults in tbe proeéssing of Information. One foaaets goal 
for loD systems shouXd be the dynamic struetaring, since information is 
ohöxact<^iz^ by tlje structnring and re-structöi-ing of dötS, stad thus 
3Wb3eQt to constant change. But tliere éKre star^g traditions in lingnistics 
4s well as in libraäcy eciöaöié , iifeiieh to sarvive^ despite the new 

technology of the electronic revolution. Ih faet, thé symbol handling 
machine G<mtributes to the ponservatton of thi^ t¥äditio»l. In thé f oilowing 
I shall try to éj^lain the reason why by means of some general principles 
goveriiing the ?tj-ucturi|tg of inf orn^tioa. 

Some principles in the structuring of Information 

There are differe^t fosis of repregentatioii With different goals. If täie 
goal is, as it was ©nce in t*e history pf libraaey oscg^iiät^fcioiii, to store 
"all" döe^Éfflentéd (writfeen) infoMöation awsiintele, it seeas niatojK^a,! a 
I!foil«3«fafi*iy of uniiäwr-sal olaeöifioaticm lies böhind the stMietEurin^ #f 
itéms. A well-Jtnown and cooBDon type is made ©f the prinöiple of hierarchy. 
Such a system has only a small potential for guiek and eäsy adaptation to 
a particular structure in löéäcing for ftöW ihfösaDStion. Coneretely, once a 
book is put on a Shelf It is bound to its plaee. A micare fl^ifele ötgaoiiza-. 
tlon is ptovided hy faeet claaisificatiaä> this^agh the lat^ral relationships 
made explicit. 

Recent eff orts to stnicture inf onoation have been made through different 
kind of netwouks, l^e starting point was soss Qutllian's mmams^ iwsdel 
ässMSÉoing that human ooipiitton is a<(«öciatively sttruatur^d. Tbé eo-ealled 
semantic network builds oh concepts, represented as aode Idåmla which heiw 
links to their attxibutös (p2?Qp:©rttes> . In information search systäns 
these networks are used in answering ^stione about faqts in a data base. 
sueh ^ fact, é.g. a property of a eertain co»ceptr may not direetly lead 
to an answer to facts about a oonoept öf higher order. 

Låter constructions of semantic netwoj^^iaatfsii clear täbis represeaatatifån 

is founded on the classical-philosöphieal way of structuring the world, 
similar to generic and other lexiesl-seiiiantic relatiönäbiipö . It ioay there- 
fore be called étattetö. I!hus the hetwork prinetplfi .ijg based on relationships 
within each concept. The structure does not accounfe for Gosntextual relations 
sueh as they anergé in »iitural lang^ge se!jfe^68, IkO. the dynamic proper- 
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ties o£ information. Än atteinpt In this direction is the PRECIS system whicte 
tries to builä in structural relationships in tfee coding tout still seems to 
rely very much on l^aditiooal classiflcation principles. 



In Computer science, especially araön^r thoSé actlf Iclal intelligénce re- 

searchers dealing with data base management, there is a trend to use the 

network principle in a restricted f rame or cont^t by the constructlon of 

2 

so- c al led topic hierarchies in or^er preveMt the assjoieiatt,Qns fxäm 
"exploding" in the information search procedure. The AI researchers are 
mainly developing mechanisms for logical deduction, i.e, the application of 
rules of inference to statemen t s made in a formål language, whose "semaia- 
tics" is well specified. Thus since the intelligence of the computer is 
restricted, dynamic structuring is not possible. It is an automatization of 
the philosophy and logic built.into classic information structuring, and, 
nnf ortunately , the analogy made conc«rning theories of the structuring and 
functioning of human cognition is one main reason for the malntenance of the 
traditionaa reasoning thc! cc»istK?ictlQn of Infcxmatioii proce^slng ^ystem^. 

Simultaneously , another view of structuring, the schema principle, is dis- 

cussed among researchers in cognitive psychology. This model emanates from 

Sir Fredric Bartletfs statements on the human memory as a propositional 
3 

structure . As opposed to semantic networks whose organizers have to 
account for how many "semantic primitives" are needed for a synthetic for- 
mation of concepts, a structure based on schemata tries to explore the 
advantages of an adaptively operating process. The schema utilizes the 
relations between abstractions , This means that information need not bé 
explicit as in the network, but implicit, i.e. ©oabedded in the structure. 

Formats of representation 

From a lingul stic point of view a model based on propositions would be repre- 
sented as a Noun -v- Noun model, or in its more logical variety, as a 



predic a te- argument statements v iN , ... I . This representation format is 



of ten used in the processing and computing of natural language data. The 



text. The text is in these cases one sentence. HoweVer, when the purpose 
is to inform about a sequence of events, as e.g. a sclentific process, a 
process oriented model is required. For Indo-European languages, the 
Agent-action-Object paradigm seems to be general. This model is dynamic and 
takes dir ec tion and intentionality into account and can be used in psycho- 
iogxc studies pf inforinatJLon processed from whole texts . By means of the 




inference s are then drawn to the semantic or the logical structure of the 
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AaO ncMlel latent djjnenslons in thé Infoj^saatlon structure can be detected > 
iniéréxtoes are nadé to tJie relatlonaX stÉiJc^ure between concetrt» and 

Theoretical starting-points for an Inf onpa tion process ing experiment 

Stacting wit^ the apsiuaptlon tiitat the ctoeuinent^ tltle Is the flrst and oM^^sm 
OQly Qontact an Infoxoation ^arcb^K has wii^ a selentlflQ t«3Et in thQ 
process df judging llis conten't, tts langtia^e s^tlructuire would be conoeivöd 
as being the eoiopiaunioative s^rfacé bétweén autJior and readeir^ intex^ 
Qediate language Is in the experiment asauned tp represen^ nbs^ae^l^ion 
o£ all " scietttlf ic event^a" xe^T^&å in tiie teset am a siiole. 

If scientific reporting is concemed with establlshing causal connecticnm 
6 

between events , tboae shoiild be det^cted hy loe^s of a »odei \^ose coispp- 

nents are assi§ned variableö r^Eea«tnting scientific entlties instead of 

linguistic or psychological ones, Such a model has be«n developed bf the 

name of the Pir^&l^t^inathod-Ctoal paradiip asmooing %heae ptstoepta bein^^ 

7 

central to reséarch work . The PmG model thus repreaents an abatract 
proposition and as auch its characterlstlcs dlffer tn operationalissatlon 
tram more natural-language adaptad inodela* !^ie may be ill^at^ated wlt^ 
es^dsple from a transfozmätionäi ataf e. Firta» an itttigirview ääsottt^ researioh 
f^ere could h& a stat^ent lDe« 

X havé analyzed -titles for aeveräl ment^s (I) 

where the researcher is present (I) and tells about a process that has 
happened (verb forms) and a i so for how long tiine. When the same author 
writes a report about his research the time specif ication will not be 
present in the title (there is no "here and now") and his person is 
implicit: 

Ån analysis of titles {2i 

% transformation has taken place.. Itv order to marlc the object of bita mlä3i&^ 
■WpLé prepositioh "of" functions as pölnter* The aotiyity has been tranan 
formed to one concept designating aaam kind o£ reaearcH or tnvestigation 
method. Since reseiurch does not handle concrete obiieGt but inatead problema, 
it is a^^ropriate to ehaxnge the label *'<^;}ect" in fairoor of the label 
"problem" . rOina m and P have baen dlacemed. 'Bm G een^mnent ta markad 
throogh the preposition "for" . ^^e is also an o^ptloiml emj^isomBt in the 
model when a research instrument is inwlved. ^e pr^iositioh to maxrk 

8 

instruments ia "with". %is la deacribed in mo-re detall in I. Blers^hank . 
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The opera tionalizat lon by means of an algorithra that automatically codes ti- 
tles in accordance wlth this conceptual information structure builds on a 

{ew but general principles. I believe that these princlples can be used to 

explain the processing of natural language and the procedure of mediating 

infozioation between man and the roachine. If the interpretation of language 

data is based on cognitlve functions such as they are studied by cognitlon 

oriented scientists, inuch of the confusion between the outcome based on 

formål logicand the logic of conception may disappear. 

9 

Among others , Piaget and Inhelder have made extensive experiments showing 
that human cognition seems to be spatially organized- The child's con- 
ception develops from a simple demarcation of objects to an under standing 
of two-dimensional relations of several kinds- Adults Learn to reläte 
multidimensional phéniÖBteha and seem to use these organ izing principles in 
a kind of coordinate system for the orlentation in space and time . The bas i c 
hypothesis under lying the algorithra is that the re are cues in the inter- 
mediate structure of the scientific title that reläte the concepts and 
orgétnize them in such a way that a cognitlve structure can be detected 
{l.e. re-cognized> . The prepositions are therefore used here as functors, 
structuring the scientific conceptualization by means of an unambiguous 
organization. Based on the intentionality of the PmG model and the visually 
signalled explicitness of conceptual demarcation the following computational 
liiiguistic model bas governed the algorithmic analysis. 

Kethod— 3M3f<.#>ProbIém— r — .fc^ith— ^»fhstrtiment — ^for-*><Goal . > 

»in »in fin 

Figure 1 » General principles of an algorithra for conceptual recognition 

Some prepoaitiöns point to concepts, inaiéattäg täie dlréetiMi (intéritioih) 

of the research activity (here called "intentional" ^^repositions) . Others 

procedurally defiiarcate^ l.e. give an ei^liclt speGification of the eeam&pt. 

in question. This algöritiimic daflnitiöa of the relationship has oreated 

the name "extensional" pareposltions. A ^re thorough dlscaissioa is £0uM 
g 

In i. Biersc^enk , where älso thé zi;|l;«s for decQäJtJig ^t; sever^ itewls 
Patterns in an exjperimental data base 

l^t the testiog öf the algorithni a representative saiaple of 9,000 lädJ^lio^ 
graphlc descirlptions has besn used» T^e sampie is based on a, definitic»n 
of "z-eseaz-chejr" , ttorks pro^ced these researc£hers conatdLtiite the 
seii«]gtifiG title colleetipn. 
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According to the analytical roodél empiG^ed» the Göneepfeualizatlölis (äs 
represented by positions In the söhemä) Mäy hé eitténded in va^Jrlhg degröes. 
Experiinentai results show that the töost ehasacteeistie féätUire in thls 
science field is the Probleill dcSajönetit , slngle or with öhe extéttsiöfi» 
Because of the intientiöilality ifihétént itt thfe töödel» this pättétft éliould 
be interpret ed as expressing an implicit ifttentiööälityi tf tWb eöJ^heilts 
appear it is most likely a S^robiem aiid ä Métiiödt *riié ptéämm of the Äéthöd 
should be interpreted as ah öttJlicife eÄ^iréåälön of ifitéHfciöfitlife^^ itt 
research itself . The more complex the pattéfftg ä*e thé iéää effeéii feh^ 
appear. There also seems td be a feéttdeficy -feha* Siäfly e*EteftåiSns f tSVeilt 
"transitipns" beti/éeft eSäSJiöfteatä . 

A val ida tion of this conceptual sti?uQtiite has beeti perfoJnaéd toy öötSélätitig 
the pattern types with types of däeuitéfiÉi röt eJtäittple, feéJEtbooks and öther 
kinds of monographs are lésö colnplex thän tötml atticles, whieh lii fetifh 
are less complex than reseärch repotts. Thus three getteifäl stfiietuÉéö 
appear in the roaterial, namely (1) öjiplicife inténtiönality, (%} imglieit 
intentionality + second degreé extängiettälifesf^ ääd C3) eJtpliöit iAteii- 
tioftality + first degree exfeehsiohäliÉ^ * 

Conceptual information in sélentific titles 

Apart from sotog titles (about .005%) with a low abstractioft level cöättpärea 
with expectancies and ätrtlctural lögiö built ihto the algotitliiii, fihS 
conceptual decoding has resulted in äöfiJé dä ta registets (fileg) dörfe- 
sporiding to the various coraponents öf the öodeil* 21hé&6 jeéfiäfeéts até fiöw 
funcionally related to each other because öf the riön-philosophieal 
classiflcation. The schema model as a struetuicläg |>JMhcii)lé älso rsVeals- 
such dimensions that a manual analysis could havé felrföftfiéd öftiy With 
difficulty. To illus träte the diffézefiee t \tmlå liJcf jm Uäl&mä Qi^ UUm 

JseegräMt^ G*^ éMldreö 4^ 

in P . I 

Äccötöiag to the msåel m tee. ^mim ®IÄlt fmh åti tSfiåS itlädy 

&m S©thMs äiid téöMiiftiös «fé iééixmS^Sså m ^« ■mmm^- "töt^iffÄtåén"! 
the differejit steps tö tafee ätö Aöt É^ffl£#it« Ä liögaiJti© ÄÉåä^Möi tifeétt 
tfee ifttea^retation medel ås émh0tmiå 1» ftÄfeätal läfiguagé *afiäMäii> would 
tegarö lat© wlth-phfase, ±.e, tite Mae^ %ai^t®eps'*' -^^ m&mist^ M 

ijEiföseDttailon ifflodel, hmmä0t msmm tSåé jMS^*% . -6@ ^^e iaSt*aafeti*al.- 
Keaiity, thiö ähottiä tee m& wm .ésmHm m^mnm Bt m$%!kSåms 

nm inatrumént. it is likely that the integ£atiöft méiåiodJ^log^ ii diif^at 



for tliose cblläxen. Therefore, it was of special interest in this repoirt. 

The schematie generality, becamse of the fixed positipns of tfee cooiponents 
detects the wariabllity, i.e. the våriations among the v^lues asstonecl virider 
éach varlable. Thus "integration" oan be ä graGteical way of haiwällng the 
children and also a method of study. Examples of variability of methods 
generated are "research" (incorporating several aetipns) , "reflectiens" 
(a way of report ing one's result) , "handbooJc" (a kind of niethodologieal 
atrategy in eduoating research studeints) . Consider ailsö 

Goals för t€(ac;her training Goals in teäGher traiöing (4) 
m G ■ . P 

The concept "goals" is a method in the first case, representing activities 
among thé&e reséarchers involving éfoal deseiriptl-on. It has a oleär miethod- 
ological meaning witiiln educatlonal technology in whiqh teacher training 
is the overall goal. The preposition "for" recognizes the first "goal" as 
a method, whereas the preposition "in" cödes -it as a prÉtelem, sincé an 
explicit intentionality does not exisfc in Öie Sepond case, oftius goals in 
teacher training just specifies the context within which a certain prcäblan 
is dealt with. That "goals" is a noun is not of any ia^rt in the 
funcionally oriented registers. The author nay discuss the same goals in 
thé tvra titles, but from dlfferent viewpoints* from diffgrent fimctional 
doHiains. This also maUtes tlie following title funotionally ccamnunicative 
to the infomnation searcher 

School for the 80's (5) 

It was written 20 years ago when the 80 's still belonged to the future. 
The Method component is here given a broader meaning, since the school may 
also be se en as an instrument, Method and instrument are components which 
ean form method- (instrument) -goal hierarchies In relation to the degree of 
complexity in the desired goals. in order to reduce method and instrument 
to a simple concept the term "means" is used. In the light of the theoret- 
ical assumption and knowledge of this authors activities and field of 
inqulry in Swedish educatlonal research, I believe that the proposed 
interpretation can be validated. 
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